skip to main content


Search for: All records

Creators/Authors contains: "Zhang, Haowen"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. ABSTRACT

    Using recent empirical constraints on the dark matter halo–galaxy–supermassive black hole (SMBH) connection from z = 0–7, we infer how undermassive, typical, and overmassive SMBHs contribute to the quasar luminosity function (QLF) at z = 6. We find that beyond Lbol = 5 × 1046 erg s−1, the z = 6 QLF is dominated by SMBHs that are at least 0.3 dex above the z = 6 median M•–M* relation. The QLF is dominated by typical SMBHs (i.e. within ±0.3 dex around the M•–M* relation) at Lbol ≲ 1045 erg s−1. At z ∼ 6, the intrinsic M•–M* relation for all SMBHs is slightly steeper than the z = 0 scaling, with a similar normalization at $M_* \sim 10^{11} \, \mathrm{M}_\odot$. We also predict the M•–M* relation for z = 6 bright quasars selected by different bolometric luminosity thresholds, finding very good agreement with observations. For quasars with Lbol > 3 × 1046 (1048) erg s−1, the scaling relation is shifted upwards by ∼0.35 (1.0) dex for 1011M⊙ galaxies. To accurately measure the intrinsic M•–M* relation, it is essential to include fainter quasars with Lbol ≲ 1045 erg s−1. At high redshifts, low-luminosity quasars are thus the best targets for understanding typical formation paths for SMBHs in galaxies.

     
    more » « less
  2. Abstract Motivation Oxford Nanopore Technologies sequencing devices support adaptive sequencing, in which undesired reads can be ejected from a pore in real time. This feature allows targeted sequencing aided by computational methods for mapping partial reads, rather than complex library preparation protocols. However, existing mapping methods either require a computationally expensive base-calling procedure before using aligners to map partial reads or work well only on small genomes. Results In this work, we present a new streaming method that can map nanopore raw signals for real-time selective sequencing. Rather than converting read signals to bases, we propose to convert reference genomes to signals and fully operate in the signal space. Our method features a new way to index reference genomes using k-d trees, a novel seed selection strategy and a seed chaining algorithm tailored toward the current signal characteristics. We implemented the method as a tool Sigmap. Then we evaluated it on both simulated and real data and compared it to the state-of-the-art nanopore raw signal mapper Uncalled. Our results show that Sigmap yields comparable performance on mapping yeast simulated raw signals, and better mapping accuracy on mapping yeast real raw signals with a 4.4× speedup. Moreover, our method performed well on mapping raw signals to genomes of size >100 Mbp and correctly mapped 11.49% more real raw signals of green algae, which leads to a significantly higher F1-score (0.9354 versus 0.8660). Availability and implementation Sigmap code is accessible at https://github.com/haowenz/sigmap. Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less
  3. Abstract Studies of rest-frame optical emission in quasars at z > 6 have historically been limited by the wavelengths accessible by ground-based telescopes. The James Webb Space Telescope (JWST) now offers the opportunity to probe this emission deep into the reionization epoch. We report the observations of eight quasars at z > 6.5 using the JWST/NIRCam Wide Field Slitless Spectroscopy as a part of the “A SPectroscopic survey of biased halos In the Reionization Era (ASPIRE)” program. Our JWST spectra cover the quasars’ emission between rest frame ∼4100 and 5100 Å. The profiles of these quasars’ broad H β emission lines span a full width at half maximum from 3000 to 6000 km s −1 . The H β -based virial black hole (BH) masses, ranging from 0.6 to 2.1 billion solar masses, are generally consistent with their Mg ii -based BH masses. The new measurements based on the more reliable H β tracer thus confirm the existence of a billion solar-mass BHs in the reionization epoch. In the observed [O iii ] λ λ 4960,5008 doublets of these luminous quasars, broad components are more common than narrow core components (≤ 1200 km s −1 ), and only one quasar shows stronger narrow components than broad. Two quasars exhibit significantly broad and blueshifted [O iii ] emission, thought to trace galactic-scale outflows, with median velocities of −610 and −1430 km s −1 relative to the [C ii ] 158 μ m line. All eight quasars show strong optical Fe ii emission and follow the eigenvector 1 relations defined by low-redshift quasars. The entire ASPIRE program will eventually cover 25 quasars and provide a statistical sample for the studies of the BHs and quasar spectral properties. 
    more » « less
    Free, publicly-accessible full text available June 29, 2024
  4. null (Ed.)
    Abstract Background Third-generation single molecule sequencing technologies can sequence long reads, which is advancing the frontiers of genomics research. However, their high error rates prohibit accurate and efficient downstream analysis. This difficulty has motivated the development of many long read error correction tools, which tackle this problem through sampling redundancy and/or leveraging accurate short reads of the same biological samples. Existing studies to asses these tools use simulated data sets, and are not sufficiently comprehensive in the range of software covered or diversity of evaluation measures used. Results In this paper, we present a categorization and review of long read error correction methods, and provide a comprehensive evaluation of the corresponding long read error correction tools. Leveraging recent real sequencing data, we establish benchmark data sets and set up evaluation criteria for a comparative assessment which includes quality of error correction as well as run-time and memory usage. We study how trimming and long read sequencing depth affect error correction in terms of length distribution and genome coverage post-correction, and the impact of error correction performance on an important application of long reads, genome assembly. We provide guidelines for practitioners for choosing among the available error correction tools and identify directions for future research. Conclusions Despite the high error rate of long reads, the state-of-the-art correction tools can achieve high correction quality. When short reads are available, the best hybrid methods outperform non-hybrid methods in terms of correction quality and computing resource usage. When choosing tools for use, practitioners are suggested to be careful with a few correction tools that discard reads, and check the effect of error correction tools on downstream analysis. Our evaluation code is available as open-source at https://github.com/haowenz/LRECE . 
    more » « less
  5. Abstract We present the first results from the JWST program A SPectroscopic survey of biased halos In the Reionization Era (ASPIRE). This program represents an imaging and spectroscopic survey of 25 reionization-era quasars and their environments by utilizing the unprecedented capabilities of NIRCam Wide Field Slitless Spectroscopy (WFSS) mode. ASPIRE will deliver the largest ( ∼ 280 arcmin 2 ) galaxy redshift survey at 3–4 μ m among JWST Cycle 1 programs and provide extensive legacy values for studying the formation of the earliest supermassive black holes, the assembly of galaxies, early metal enrichment, and cosmic reionization. In this first ASPIRE paper, we report the discovery of a filamentary structure traced by the luminous quasar J0305–3150 and 10 [O iii ] emitters at z = 6.6. This structure has a 3D galaxy overdensity of δ gal = 12.6 over 637 cMpc 3 , one of the most overdense structures known in the early universe, and could eventually evolve into a massive galaxy cluster. Together with existing VLT/MUSE and ALMA observations of this field, our JWST observations reveal that J0305–3150 traces a complex environment where both UV-bright and dusty galaxies are present and indicate that the early evolution of galaxies around the quasar is not simultaneous. In addition, we discovered 31 [O iii ] emitters in this field at other redshifts, 5.3 < z < 6.7, with half of them situated at z ∼ 5.4 and 6.2. This indicates that star-forming galaxies, such as [O iii ] emitters, are generally clustered at high redshifts. These discoveries demonstrate the unparalleled redshift survey capabilities of NIRCam WFSS and the potential of the full ASPIRE survey data set. 
    more » « less
    Free, publicly-accessible full text available June 29, 2024
  6. Abstract

    We report the discovery of an accreting supermassive black hole atz= 8.679. This galaxy, denoted here as CEERS_1019, was previously discovered as a Lyα-break galaxy by Hubble with a Lyαredshift from Keck. As part of the Cosmic Evolution Early Release Science (CEERS) survey, we have observed this source with JWST/NIRSpec, MIRI, NIRCam, and NIRCam/WFSS and uncovered a plethora of emission lines. The Hβline is best fit by a narrow plus a broad component, where the latter is measured at 2.5σwith an FWHM ∼1200 km s−1. We conclude this originates in the broadline region of an active galactic nucleus (AGN). This is supported by the presence of weak high-ionization lines (N V, N IV], and C III]), as well as a spatial point-source component. The implied mass of the black hole (BH) is log (MBH/M) = 6.95 ± 0.37, and we estimate that it is accreting at 1.2 ± 0.5 times the Eddington limit. The 1–8μm photometric spectral energy distribution shows a continuum dominated by starlight and constrains the host galaxy to be massive (log M/M∼9.5) and highly star-forming (star formation rate, or SFR ∼ 30 Myr−1; log sSFR ∼ − 7.9 yr−1). The line ratios show that the gas is metal-poor (Z/Z∼ 0.1), dense (ne∼ 103cm−3), and highly ionized (logU∼ − 2.1). We use this present highest-redshift AGN discovery to place constraints on BH seeding models and find that a combination of either super-Eddington accretion from stellar seeds or Eddington accretion from very massive BH seeds is required to form this object.

     
    more » « less
  7. Graph based non-linear reference structures such as variation graphs and colored de Bruijn graphs enable incorporation of full genomic diversity within a population. However, transitioning from a simple string-based reference to graphs requires addressing many computational challenges, one of which concerns accurately mapping sequencing read sets to graphs. Paired-end Illumina sequencing is a commonly used sequencing platform in genomics, where the paired-end distance constraints allow disambiguation of repeats. Many recent works have explored provably good index-based and alignment-based strategies for mapping individual reads to graphs. However, validating distance constraints efficiently over graphs is not trivial, and existing sequence to graph mappers rely on heuristics. We introduce a mathematical formulation of the problem, and provide a new algorithm to solve it exactly. We take advantage of the high sparsity of reference graphs, and use sparse matrix-matrix multiplications (SpGEMM) to build an index which can be queried efficiently by a mapping algorithm for validating the distance constraints. Effectiveness of the algorithm is demonstrated using real reference graphs, including a human MHC variation graph, and a pan-genome de-Bruijn graph built using genomes of 20 B. anthracis strains. While the one-time indexing time can vary from a few minutes to a few hours using our algorithm, answering a million distance queries takes less than a second. 
    more » « less
  8. Aligning DNA sequences to an annotated reference is a key step for genotyping in biology. Recent scientific studies have demonstrated improved inference by aligning reads to a variation graph, i.e., a reference sequence augmented with known genetic variations. Given a variation graph in the form of a directed acyclic string graph, the sequence to graph alignment problem seeks to find the best matching path in the graph for an input query sequence. Solving this problem exactly using a sequential dynamic programming algorithm takes quadratic time in terms of the graph size and query length, making it difficult to scale to high throughput DNA sequencing data. In this work, we propose the first parallel algorithm for computing sequence to graph alignments that leverages multiple cores and single-instruction multiple-data (SIMD) operations. We take advantage of the available inter-task parallelism, and provide a novel blocked approach to compute the score matrix while ensuring high memory locality. Using a 48-core Intel Xeon Skylake processor, the proposed algorithm achieves peak performance of 317 billion cell updates per second (GCUPS), and demonstrates near linear weak and strong scaling on up to 48 cores. It delivers significant performance gains compared to existing algorithms, and results in run-time reduction from multiple days to three hours for the problem of optimally aligning high coverage long (PacBio/ONT) or short (Illumina) DNA reads to an MHC human variation graph containing 10 million vertices. 
    more » « less